AITopics | unstructured data

Collaborating Authors

unstructured data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Revealing Multimodal Causality with Large Language Models

Neural Information Processing SystemsJun-13-2026, 07:57:00 GMT

Uncovering cause-and-effect mechanisms from data is fundamental to scientific progress. While large language models (LLMs) show promise for enhancing causal discovery (CD) from unstructured data, their application to the increasingly prevalent multimodal setting remains a critical challenge. Even with the advent of multimodal LLMs (MLLMs), their efficacy in multimodal CD is hindered by two primary limitations: (1) difficulty in exploring intra-and inter-modal interactions for comprehensive causal variable identification; and (2) insufficiency to handle structural ambiguities with purely observational data. To address these challenges, we propose MLLM-CD, a novel framework for multimodal causal discovery from unstructured data. It consists of three key components: (1) a novel contrastive factor discovery module to identify genuine multimodal factors based on the interactions explored from contrastive sample pairs; (2) a statistical causal structure discovery module to infer causal relationships among discovered factors; and (3) an iterative multimodal counterfactual reasoning module to refine the discovery outcomes iteratively by incorporating the world knowledge and reasoning capabilities of MLLMs. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed MLLM-CD in revealing genuine factors and causal relationships among them from multimodal unstructured data.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.59)

Add feedback

Data readiness for agentic AI in financial services

MIT Technology ReviewMay-14-2026, 13:00:00 GMT

The success of agentic AI in financial services depends not just on smarter models, but on an authoritative context data store--one that is accessible, reliable, and governed at scale. Financial services companies have unique needs when it comes to business AI. They operate in one of the most highly regulated sectors while responding to external events that are updated by the second. As a result, the success of agentic AI in financial services depends less on the sophistication of the system and more on the quality, security, and accessibility of the data it relies on. "It all starts with the data," says Steve Mayzak, global managing director of Search AI at Elastic. Agentic AI--systems that can independently plan and take actions to complete tasks, rather than simply generate responses--holds enormous potential for financial services due to its ability to incorporate real-time data and optimize complex workflows.

agentic ai, artificial intelligence, real time system, (13 more...)

MIT Technology Review

Industry: Banking & Finance > Financial Services (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.99)
Information Technology > Architecture > Real Time Systems (0.90)

Add feedback

From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When

Neural Information Processing SystemsMar-18-2026, 19:01:20 GMT

Large language models (LLMs) like transformers demonstrate impressive in-context learning (ICL) capabilities, allowing them to makepredictions for new tasks based on prompt exemplars without parameter updates. While existing ICL theories often assume structured training data resembling ICL tasks (e.g., x-y pairs for linear regression), LLMs are typically trained unsupervised on unstructured text, such as web content, which lacks clear parallels to tasks like word analogy. To address this gap, we examine what enables ICL in models trained on unstructured data, focusing on critical sequence model requirements and training data structure. We find that many ICL capabilities canemerge simply from co-occurrence of semantically related word pairs in unstructured data; word analogy completion, for example, can provably arise purely through co-occurrence modeling, using classical language models like continuous bag of words (CBOW), without needing positional information or attention mechanisms. However, positional information becomes crucial for logic reasoning tasks requiring generalization to unseen tokens. Finally, we identify two cases where ICL fails: one in logic reasoning tasks that require generalizing to new, unseen patterns, and another in analogy completion where relevant word pairs appear only in fixed training positions. These findings suggest that LLMs' ICL abilities depend heavily on the structural elements within their training data.

artificial intelligence, large language model, natural language, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Using unstructured data to fuel enterprise AI success

MIT Technology ReviewJan-8-2026, 13:00:00 GMT

Organizations have a wealth of unstructured data that most AI models can't yet read. Preparing and contextualizing this data is essential for moving from AI experiments to measurable results. Enterprises are sitting on vast quantities of unstructured data, from call records and video footage to customer complaint histories and supply chain signals. Yet this invaluable business intelligence, estimated to make up as much as 90% of the data generated by organizations, historically remained dormant because its unstructured nature makes analysis extremely difficult. But if managed and centralized effectively, this messy and often voluminous data is not only a precious asset for training and optimizing next-generation AI systems, enhancing their accuracy, context, and adaptability, it can also deliver profound insights that drive real business outcomes. A compelling example of this can be seen in the US NBA basketball team the Charlotte Hornets who successfully leveraged untapped video footage of gameplay--previously too copious to watch and too unstructured to analyze--to identify a new competition-winning recruit.

charlotte hornet, footage, unstructured data, (12 more...)

MIT Technology Review

Country:

North America > United States > Massachusetts (0.05)
Asia > Middle East > Jordan (0.05)
Asia > China > Beijing > Beijing (0.05)

Industry: Leisure & Entertainment > Sports > Basketball (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (0.99)
Information Technology > Artificial Intelligence > Natural Language (0.98)

Add feedback

Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies

Neural Information Processing SystemsDec-24-2025, 03:26:35 GMT

Estimating the effect of intervention from observational data while accounting for confounding variables is a key task in causal inference. Oftentimes, the confounders are unobserved, but we have access to large amounts of additional unstructured data (images, text) that contain valuable proxy signal about the missing confounders. This paper argues that leveraging this unstructured data can greatly improve the accuracy of causal effect estimation. Specifically, we introduce deep multi-modal structural equations, a generative model for causal effect estimation in which confounders are latent variables and unstructured data are proxy variables. This model supports multiple multimodal proxies (images, text) as well as missing data. We empirically demonstrate that our approach outperforms existing methods based on propensity scores and corrects for confounding using unstructured inputs on tasks in genomics and healthcare.

causal effect estimation, deep multi-modal structural equation, name change, (5 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.61)

Technology: Information Technology > Artificial Intelligence (0.41)

Add feedback

Cross-Modal Temporal Fusion for Financial Market Forecasting

Pei, Yunhua, Cartlidge, John, Mandal, Anandadeep, Gold, Daniel, Marcilio, Enrique, Mazzon, Riccardo

arXiv.org Artificial IntelligenceNov-4-2025

Accurate forecasting in financial markets requires integrating diverse data sources, from historical prices to macroeconomic indicators and financial news. However, existing models often fail to align these modalities effectively, limiting their practical use. In this paper, we introduce a transformer-based deep learning framework, Cross-Modal Temporal Fusion (CMTF), that fuses structured and unstructured financial data for improved market prediction. The model incorporates a tensor interpretation module for feature selection and an auto-training pipeline for efficient hyperparameter tuning. Experimental results using FTSE 100 stock data demonstrate that CMTF achieves superior performance in price direction classification compared to classical and deep learning baselines. These findings suggest that our framework is an effective and scalable solution for real-world cross-modal financial forecasting tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA251474

2504.13522

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Revealing Multimodal Causality with Large Language Models

Li, Jin, Wang, Shoujin, Zhang, Qi, Liu, Feng, Liu, Tongliang, Cao, Longbing, Yu, Shui, Chen, Fang

arXiv.org Artificial IntelligenceOct-31-2025

Uncovering cause-and-effect mechanisms from data is fundamental to scientific progress. While large language models (LLMs) show promise for enhancing causal discovery (CD) from unstructured data, their application to the increasingly prevalent multimodal setting remains a critical challenge. Even with the advent of multimodal LLMs (MLLMs), their efficacy in multimodal CD is hindered by two primary limitations: (1) difficulty in exploring intra- and inter-modal interactions for comprehensive causal variable identification; and (2) insufficiency to handle structural ambiguities with purely observational data. To address these challenges, we propose MLLM-CD, a novel framework for multimodal causal discovery from unstructured data. It consists of three key components: (1) a novel contrastive factor discovery module to identify genuine multimodal factors based on the interactions explored from contrastive sample pairs; (2) a statistical causal structure discovery module to infer causal relationships among discovered factors; and (3) an iterative multimodal counterfactual reasoning module to refine the discovery outcomes iteratively by incorporating the world knowledge and reasoning capabilities of MLLMs. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed MLLM-CD in revealing genuine factors and causal relationships among them from multimodal unstructured data.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.17784

Country:

Europe (0.27)
Asia (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.70)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

A Case for Computing on Unstructured Data

Sadia, Mushtari, Chowdhury, Amrita Roy, Chen, Ang

arXiv.org Artificial IntelligenceSep-19-2025

Unstructured data, such as text, images, audio, and video, comprises the vast majority of the world's information, yet it remains poorly supported by traditional data systems that rely on structured formats for computation. We argue for a new paradigm, which we call computing on unstructured data, built around three stages: extraction of latent structure, transformation of this structure through data processing techniques, and projection back into unstructured formats. This bi-directional pipeline allows unstructured data to benefit from the analytical power of structured computation, while preserving the richness and accessibility of unstructured representations for human and AI consumption. We illustrate this paradigm through two use cases and present the research components that need to be developed in a new data system called MXFlow.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.14601

Country: North America > United States > Michigan (0.14)

Genre: Research Report (0.50)

Industry: